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Abstract 

Background: Genetic linkage maps are invaluable resources in plant research. They provide a key tool for many 
genetic applications including: mapping quantitative trait loci (QTL); comparative mapping; identifying unlinked 
(i.e. independent) DNA markers for fingerprinting, population genetics and phylogenetics; assisting genome 
sequence assembly; relating physical and recombination distances along the genome and map-based cloning of 
genes. Eucalypts are the dominant tree species in most Australian ecosystems and of economic importance 
globally as plantation trees. The genome sequence of E. grandis has recently been released providing 
unprecedented opportunities for genetic and genomic research in the genus. A robust reference linkage map 
containing sequence-based molecular markers is needed to capitalise on this resource. Several high density linkage 
maps have recently been constructed for the main commercial forestry species in the genus (£ grandis, E. urophylla 
and E. globulus) using sequenced Diversity Arrays Technology (DArT) and microsatellite markers. To provide a single 
reference linkage map for eucalypts a composite map was produced through the integration of data from seven 
independent mapping experiments (1950 individuals) using a marker-merging method. 

Results: The composite map totalled 1 107 cM and contained 4101 markers; comprising 3880 DArT, 213 
microsatellite and eight candidate genes. Eighty-one DArT markers were mapped to two or more linkage groups, 
resulting in the 4101 markers being mapped to 4191 map positions. Approximately 13% of DArT markers mapped 
to identical map positions, thus the composite map contained 3634 unique loci at an average interval of 0.31 cM. 

Conclusion: The composite map represents the most saturated linkage map yet produced in Eucalyptus. As the 
majority of DArT markers contained on the map have been sequenced, the map provides a direct link to the 
E. grandis genome sequence and will serve as an important reference for progressing eucalypt research. 



Background 

Genetic linkage maps are valuable resources which can be 
used to provide a framework for many genomic analyses. 
Linkage maps can be used to investigate the organisation 
and evolution of genomes through comparative mapping 
[1-3] and serve as a basis for investigating phenotypic 
traits of ecological and economic importance through the 
localisation of quantitative trait loci [QTL; 4-6]. Subse- 
quently, QTL results may be used to help guide the selec- 
tion of candidate genes for association studies or be 
applied in marker-assisted breeding programmes [7,8]. 
Linkage maps can also be used to anchor physical maps 
and assist in the assembly of genome sequences [9-11]. 
The wide application of linkage maps in combination with 



* Correspondence: cjhudson@utas.edu.au 

'School of Plant Science and CRC for Forestry, University of Tasmania, Private 
Bag 55 Hobart, Tasmania 7001, Australia 

Full list of author information is available at the end of the article 

Bio Med Central 



their value to genetics research has led to numerous link- 
age mapping projects being undertaken in plants. Detailed 
linkage maps have been produced for all of the world's 
staple cereal species [12], and in forest trees, linkage maps 
have been produced for many of the most widely-planted 
species due to their commercial importance as wood and 
fibre crops [1,13,14]. 

Grattapaglia and Sederoff [15] published the first gen- 
etic linkage map in the forest tree genus Eucalyptus in 
1994. Subsequently, many mapping pedigrees have been 
established for the purpose of linkage map construction 
and associated QTL analyses. More than 20 eucalypt 
genetic linkage maps have been reported with most 
being produced in the main commercially grown species, 
or their hybrids, from the Eucalyptus subgenus Symphyo- 
myrtus. Thus, the majority of linkage mapping projects 
have focussed on E. grandis, E. urophylla and E. globulus 
[reviewed in 16], while a smaller number of maps have 
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also been produced for E. nitens [17], E. teriticornis 
[18,19], E. camaldulensis [20] and for species in the 
closely related genus Corymbia [21]. 

Many early eucalypt linkage maps were constructed 
using random amplification of polymorphic DNA 
(RAPD) and amplified fragment length polymorphism 
(AFLP) molecular markers [16,22]. However, the an- 
onymous nature of these dominant markers has limited 
the transfer of linkage information between studies 
[16,23]. More informative, codominant markers such as 
isozyme and random fragment length polymorphism 
(RFLPs) have also been used in eucalypt linkage map- 
ping, although, their low throughput, low inter-pedigree 
polymorphism and labour intensive genotyping require- 
ments have limited their use [16,23]. The more recent 
development of highly polymorphic microsatellite mar- 
kers made available a large potential suite of markers 
that are transferrable between species and polymorphic 
in multiple pedigrees. This enabled linkage group syn- 
teny to be established between maps containing com- 
mon microsatellite markers and the positions and 
stability of QTL across multiple species to be examined 
[e.g. 24-27]. The ability to establish linkage group syn- 
teny has also enabled moderate-density comparative 
mapping studies [23,28]. 

Recent advances in molecular methods have led to 
high-throughput genotyping systems being developed 
[e.g. 29,30]. These have made it possible to quickly gen- 
erate many hundreds of markers in single mapping pedi- 
grees and have helped facilitate the construction of high 
density linkage maps [12]. Most recently in Eucalyptus, 
Diversity Arrays Technology [DArT; 31] has been used 
to generate large numbers of molecular markers for gen- 
etic linkage mapping in several mapping pedigrees 
[e.g. 11,32,33]. The eucalypt DArT markers are highly 
transferable across species from subgenus Symphyomyr- 
tus [34] and the high-throughput array-based genotyping 
system provides wide genome coverage [35] . A key bene- 
fit of the Eucalyptus DArT markers is the public avail- 
ability of the sequences of most of the 7680 markers 
contained on the genotyping array [GenBank accession 
numbers HR865291 - HR872186], thus making it pos- 
sible to anchor DArT markers directly to the reference 
E. grandis genome sequence [vl.O released January 2011; 
36]. However, while the DArT technology offers many 
advantages, the DArT markers do suffer some limita- 
tions due to their dominant nature. For example, the in- 
complete segregation information provided by those 
DArT markers segregating in a 3:1 ratio (intercross) 
results in an exponential increase of marker-ordering 
calculations compared to fully-informative co-dominant 
markers [37]. Co-dominant markers also provide more 
complete information in QTL mapping studies [e.g. 
allowing estimation of additive and dominant allelic 



effects; 38] and are more useful in some genetic ana- 
lyses, such as estimating population genetic parameters 
(e.g. inbreeding levels), relative to dominant marker 
types such as DArT. In addition, the DArT marker assay 
can be subject to cross-hybridization from duplicated 
loci in the genome, although most such artifacts can be 
excluded by preselecting markers exhibiting Mendelian 
segregation ratios in mapping pedigrees. 

At present, DArT markers have been used to construct 
linkage maps in seven independent E. globulus and/or 
E. grandis x E. urophylla hybrid family mapping pedi- 
grees [11,32,33]. All of these maps also contain a variable 
number of co-dominant microsatellite markers, which 
provide important links to many earlier eucalypt linkage 
maps. In the two largest mapping pedigrees (more than 
500 individuals each), 1010 [32] and 2229 [33] DArT 
markers, were mapped at sub-centiMorgan marker 
densities and collectively more than 4000 DArT and 
microsatellite markers have been mapped in the seven 
pedigrees. 

All DArT marker based linkage maps were constructed 
using the program JoinMap 4.0 [37]. This program is one 
of the most commonly used linkage mapping programs 
and appears to be the only software available for building 
linkage maps using the combined segregation data from 
multiple populations [39-41]. However, it is presently not 
feasible to combine the segregation data contained within 
the seven eucalypt mapping families describe above (col- 
lectively 1950 individuals), and successfully order such 
large numbers of markers within linkage groups (up 
to ~ 500) due to computational limitations (Van Ooijen 
pers comm.). To circumvent the limitations of traditional 
segregation-based methods of linkage map construction, 
alternative marker-merging strategies have been devel- 
oped. A so-called 'composite map' can be produced in 
which markers from individual component maps are 
merged into a single map based on their position relative 
to common anchor loci. For example, the 'neighbours' 
marker-merging approach of Cone et al. [42] and the 
marker-merging method implemented in the PhenoMap 
program (GeneFlow Inc. USA) have been used to success- 
fully construct high density composite maps containing 
several thousand markers in a number of plant species; in- 
cluding Sorghum [43], barley [41,44,45] and maize [42,46]. 

In this study, a marker- merging method was used to 
construct a high-density DArT and microsatellite marker 
composite linkage map from seven independently con- 
structed maps. Recent comparative mapping analyses 
using 236 to 393 markers shared between three of the 
maps [see 32] showed that these linkage maps exhibited 
high synteny (> 93.4% markers occurring on the same 
linkage groups) and high colinearity (> 93.7% markers 
having the same order within linkage groups). This indi- 
cated that it would be possible to merge markers from 
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several component maps into a single high quality map 
featuring robust marker-order together with very high 
marker density. It is expected that this composite map 
will facilitate marker and map information exchange and 
serve as a valuable reference for species in the subgenus 
Symphyomyrtus. 

Methods 

The following terms are used to describe the various 
types of linkage maps reported in this paper; (1) sex- 
averaged map - a consensus of individually constructed 
male and female maps, built in a single family using seg- 
regation data from both parents, (2) consensus map - a 
consensus of multiple individually constructed male and 
female maps, built in multiple families (e.g. F 2 double- 
pseudo backcross) using segregation data from all of the 
families, and (3) composite map - an integrated map of 
multiple sex-averaged and/or consensus maps, built 
using a marker-merging method. 

Component maps 

The composite map was built using an E. grandis x 
E. urophylla F 2 double pseudo-backcross pedigree consen- 
sus linkage map [both species from section Latoangulatae; 
33] plus one E grandis x E. urophylla sex- averaged map 
constructed in a Fi hybrid pedigree [11] and five pure- 
species R globulus [section Maidenaria; 32] sex-averaged 
linkage maps constructed in either outcrossed F 2 or Fx fam- 
ilies (hereafter referred to as 'component' maps). Compo- 
nent map family sizes ranged from 172 (GLOB-F 2 -l) to 
547 (GU-SA) and collectively contained 1,950 indivi- 
duals (Table 1). The component maps were constructed 
by different researchers. All used JoinMap 4.0 [37] with 
marker-ordering within linkage groups (LGs) estimated 
using the regression algorithm of Stam [47] combined 
with the Kosambi mapping function. All component 



maps comprised 11 linkage groups in accordance with 
the haploid chromosome number of Eucalyptus [48]. 

Before building the composite map, marker names were 
standardised across maps, homologous linkage groups were 
identified using common (anchor) loci and marker co- 
linearity between component maps was visually inspected 
in MapChart [49]. Map data was supplied for both frame- 
work (1032-marker) and comprehensive (2484-marker) 
maps built in the GU-Emb family [see 11]. Based on the 
level of marker-order agreement between linkage groups 
from these maps with other component maps, either GU- 
Emb framework (LG's 1, 3, 5, 7 and 9) or comprehensive 
(LGs 2, 4, 6, 8, 10 and 11) linkage groups were included in 
composite map construction. Five linkage groups from 
three of the smaller E. globulus mapping families (Table 1) 
were found to have substantial regions of non-colinearity 
(discordant marker-orders) with other component maps. 
Consequently, LG6 and LG10 from the GLOB-F^l map, 
LG4 and LG9 from the GLOB-Fi-4 and LG4 from the 
GLOB-Fi-5 map were excluded from composite map 
construction. 

The number of markers included for composite map 
construction ranged from 498 (GLOB-Fi-4) to 2290 
(GU-SA; Table 1). In total, this consisted of 4350 indi- 
vidual markers, including: 4089 DArT, 253 microsatel- 
lites and eight mapped genes. Ninety-six markers (2.2% 
of the total number of markers; termed 'multicopy' mar- 
kers) were mapped to two or more linkage groups across 
component maps. This resulted in the 4350 individual 
markers being mapped to 4457 positions. Of these 4457 
positions, 1960 could be considered to be bridging loci, 
meaning that these markers had been mapped to syn- 
tenic linkage groups in two or more component maps 
and would serve as anchor loci during composite map 
construction. Conversely, 2497 marker positions were 
unique to single component maps. 



Table 1 Component map details 



Linkage map 3 




Map 
abbreviation 


n 


cM 


MMI 


Markers mapped 
(percentage of unique markers in pedigree) 














DArT 


SSR 


Gene 


Total 


£. grandis x £ urophylla 


SA double pseudo-backcross F 2 b 


GU-SA 


547 


1107 


0.51 


2229 (45%) 


59 (46%) 


2 (1 00%) 


2290 (45%) 


E. grandis x £ urophylla 


Embrapa F q ce 


GU-Emb 


177 


1229 


078 


1617 (41%) 


1 93 (77%) 


0 


1810 (44%) 


£ globulus Lighthouse 


F 2 d 


GLOB-LH 


503 


1151 


1.21 


1010 (27%) 


50 (12%) 


0 


1 060 (27%) 


£ globulus FAM1 F, d 




GLOB-F,-1 


184 


1033 


1.97 


571 (14%) 


4 (0%) 


2 (0%) 


577 (14%) 


£ globulus FAM4 F, d 




GLOB-F,-4 


184 


1137 


246 


488 (1 0%) 


6 (0%) 


4 (25%) 


498 (10%) 


£ globulus FAM5 F, d 




GLOB-F,-5 


183 


1055 


2.09 


600 (22%) 


4 (0%) 


2 (0%) 


606 (21%) 


£ globulus FAM1 F 2 d 




GLOB-F 2 -1 


172 


1258 


2.73 


660 (18%) 


30 (30%) 


5 (40%) 


695 (18%) 



Summary of the component maps used to construct the composite map. For each map, progeny size (n), map length (cM; total for all 1 1 linkage groups}, mean 
marker interval (MMI; average for all 1 1 linkage groups) and total number of mapped markers (using only those linkage groups included in composite map 
construction; see Methods) are given. For DArT, microsatellite (SSR) and gene markers mapped on each component map, the percentage of markers unique to 
that map [i.e. not mapped in any of the six other component maps) are given in parentheses. a Cross details and reference; b Kullan et al. [33], c Petroli ef al. [1 1] 
and d Hudson ef al. [32]. e Data for the £ grandis x E. urophylla Embrapa F 2 component map calculated using a combination of framework and comprehensive 
linkage groups (see Methods). 
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Composite map construction 

The composite linkage map was constructed at Diver- 
sity Arrays Technology (DArT) Pty Ltd (Canberra, 
Australia) using specially developed R scripts which 
merged component map markers into the composite 
map based on their relative map positions. The E. 
grandis x E. urophylla SA F 2 (GU-SA) linkage map 
was used as the seed-map (i.e. the 'fixed backbone' to 
which markers from other component maps were 
added) due to it having the largest progeny size, the 
largest number of both mapped and unique markers 
(Table 1) and high overall marker colinearity to the 
11 main superscaffolds of the assembled E. grandis 
genome sequence [33,36]. The procedure for building 
each composite map linkage group was as follows. 
Firstly, the number of common markers in each seed- 
map - component map linkage group comparison 
was identified. Spearman rank marker-order correla- 
tions were then estimated and a heuristic 'fit value' 
for each comparison was calculated as; Fit value = cor- 
relation x log (number of common markers); where 
the second term rewards for the number of common 
markers with a diminishing returns function. Follow- 
ing selection of the component map linkage group 
with the highest Fit value, unique markers {i.e. those 
not mapped on the seed linkage group, or the 'build- 
ing' composite linkage group in following rounds) 
were added to the seed linkage group (or 'building' 
composite map linkage group) using linear regression. 
Here, the slope (m) and intersect (c) calculated from 
fitting the positions of common markers on the seed 
linkage group (pc) to their positions on the selected 
component map (pi) linkage group (pc = m x pi + c) 
was used to calculate the positions of unique compo- 
nent map markers added to the seed linkage group. 
Once this first round was completed, the remaining 
component linkage groups were compared to this 
new 'building' composite map linkage group and the 
process was repeated. This continued until all unique 
markers had been added from remaining component 
maps which shared at least three common markers 
with the building composite map linkage group and 
had a marker-order correlation coefficient > 0.50. This 
process was repeated for each linkage group to yield 
the final composite map of 11 linkage groups. Mar- 
kers which mapped to the distal ends of composite 
linkage groups and which had relatively large inter- 
marker intervals (> 5 cM) and poor support (e.g. 
mapped in one component map only) were removed. 
The numbering and orientation of linkage groups fol- 
lowed the convention established in Brondani et al. 
[23]; this also corresponds to the numbering of pseu- 
dochromosome assemblies in the E. grandis genome 
sequence [36]. 



Composite map features 

Following composite map construction, marker-order 
correlations between composite and component map 
linkage groups were calculated in SAS 9.2 (SAS Institute, 
Cary, USA) using the PROC CORR Spearman function. 
To test whether multicopy markers were distributed 
equally across linkage groups, a test was used to com- 
pare the observed versus expected number of multicopy 
marker positions occurring on each linkage group. The 
expected number of multicopy markers per linkage 
group was calculated as; (total number of multicopy 
marker positions in the composite map/total number of 
DArT marker positions in the composite map) x number 
of DArT marker positions per linkage group for that 
linkage group. The BLAST server available at Phytozome 
[36] was used to search for DArT marker duplications. 
The bl2seq tool at NCBI [50] was used to examine 
DArT marker sequence similarity/redundancy. All 
graphical representations of linkage maps were drawn 
using MapChart [49]. 

Results 

Composite map details 

A total of 4101 individual markers, comprising 3880 
DArT markers, eight gene-based markers and 213 
microsatellite markers were included in the composite 
map. The composite map totalled 1107 cM which was 
within the range of component map lengths (1033- 
1258 cM; Table 1) and contained only eleven marker 
intervals > 3 cM; with a maximum marker interval of 
5.9 cM. The composite map contained 81 multicopy 
DArT markers (2.1% of total DArT markers) which were 
mapped to 171 map positions. Most multicopy markers 
occurred on two linkage groups only, however, one mar- 
ker (ePt-574238) mapped to three linkage groups while 
four markers (ePt-503174, ePt-568818, ePt-637610, ePt- 
637861) mapped to four linkage groups. This resulted in 
the 4101 markers being mapped to 4191 positions 
(Table 2). Over half (2171 or 53%) of the markers 
mapped to these 4191 map positions had been mapped 
in a single component map only {i.e. were not shared 
among multiple component maps). Approximately 13% 
of DArT markers mapped to identical positions in the 
composite map. Therefore, the map contained 3634 
unique map loci with an average interval of 0.31 cM. 

The number of multicopy DArT marker positions on 
each linkage group ranged from 5 to 24 and represented 
1.9-6.4% of the total number of DArT markers mapped 
per linkage group (Table 2). Although LG5 and LG7 
contained a larger proportion of multicopy DArT mar- 
ker positions (e.g. LG1 contained only 5 multicopy 
DArT marker positions, or 1.9% of the total number of 
DArT marker positions; Table 3), the proportion of mul- 
ticopy DArT marker positions found on each linkage 
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Table 2 Composite map summary 



LG 


cM 


DArT 


Markers mapped 
SSR Genes 


Total 


Avpranp 
nvci uuc 

marker 
interval 
(cM) 


MC 
DArT 
pos. a 


1 


93.8 


250 


12 


0 


262 


0.42 


5 


2 


102.1 


451 


29 


0 


480 


0.24 


18 


3 


105.6 


429 


18 


2 


-149 


0.28 


21 


4 


80.9 


219 


9 


3 


231 


0.41 


12 


> 


QQ Q 

yj.y 


ODD 


8 


0 




U.jU 




6 


125.3 


408 


43 


1 


452 


0.31 


15 


/ 


87.7 


305 


9 


1 


315 


0.33 


18 


8 


137.3 


540 


26 


0 


566 


0.28 


19 


9 


82.9 


312 


20 


0 


332 


0.29 


10 


10 


97.8 


336 


20 


1 


357 


0.30 


12 


1 1 


97.3 


354 


19 


0 


373 


0.31 


1/ 


Total 


1 1 06.5 


3970 


213 


8 


4191 


0.31 


/7/ b 



Summary of the composite map: including the number of mapped markers, 
length and average marker intervals by linkage group (LG). a MC DArT pos. - 
indicates the number of multicopy (MC) DArT marker positions (pos.) occurring 
on each linkage group, ^he 171 multicopy DArT marker positions represent 
81 multicopy DArT markers (see Additional file 1). 



group did not significantly differ from that expected by 
chance across all linkage groups (^=12.99, P = 0.22, 
(if =10). There was no trend within linkage groups for 
multicopy DArT markers to be clumped in either distal 
or central linkage group areas (data not shown). Com- 
posite map marker details, component map(s) marker 



origins and multicopy DArT marker information is pre- 
sented in Additional file 1. 

Composite - component map colinearity 

Colinearity between component and composite map 
linkage groups can be viewed graphically in Figure 1 
(for the GLOB-LH map) and in Additional file 2 (all 
component maps). Pair- wise linkage group marker- 
order correlations were generally high (greater than 
0.90; Table 3) reflecting the high colinearity shown 
between common markers (Figure 1 and Additional 
file 2). However, a small degree of non-colinearity 
did occur between all component maps and the com- 
posite map. Eleven component map linkage groups 
had marker-order correlations of less than 0.90 
(Table 3), however, these linkage groups were either, 
(1) identified as having poor marker colinearity with 
other component maps prior to composite map con- 
struction and excluded from analysis (five linkage 
groups with gray shading in Table 3), or (2) marker- 
order information from these linkage groups was not 
incorporated during composite map construction 
(correlation value without asterisk; six linkage groups 
Table 3) due to markers from these maps being pre- 
viously added from other linkage groups having bet- 
ter fit values. Thus, these poorly correlated linkage 
groups did not adversely affect the composite map 
marker-order. For each linkage group, the average 
pair-wise marker-order correlation between the 



Table 3 Composite - component map marker-order correlation coefficients 



Composite map LG Component map 





GLOB-LH 


GU-Emb 


GLOB-F 2 -1 


GLOB-F,-1 


GLOB-F,-4 


GLOB-F,-5 


Average 3 


Average b 


1 


0.9? 


S* 


0.56 F 


0.99* 


0.99* 


0.98* 


0.95* 


0.91 


0.98 


2 


0.9? 


5* 


0.95 c * 


0.93* 


0.95* 


0.98* 


0.85 


0.94 


0.96 


3 


0.9' 


1* 


0.99 F * 


0.97* 


0.99* 


0.97* 


0.99* 


0.97 


0.97 


4 


0.9? 


S* 


0.74 c 


0.96* 


0.89 


0.79 e * 


01g ns,ex 


0.76 


0.97 


5 


0.9? 


r 


0.92 F 


0.96* 


0.96* 


0.99* 


0.96* 


0.96 


0.97 


6 


0.95 


r 


0.99 c * 


0.99* 


0.63 ex 


0.99* 


0.86 


0.91 


0.99 


/ 


0.9? 


i* 


0.65 F 


0.99* 


0.98* 


0.96* 


0.91* 


0.91 


0.96 


8 


0.9? 


s* 


0.99 c * 


0.99* 


0.66 


0.94* 


0.99* 


0.93 


0.98 


9 


0.9! 


)* 


0.97 F * 


0.98* 


0.97* 


0.65 ex 


0.97 


0.92 


0.98 


10 


0.95* 


0.98 c * 


0.97* 


035 ns,ex 


0.92 


0.97* 


0.86 


0.97 


1 1 


0.95 


)* 


0.99 c * 


0.99* 


0.97* 


0.96 


0.99* 


0.98 


0.99 


Average 11 


0.9? 


s 


0.88 


0.97 


0.85 


0.92 


0.87 






Average d 


0.9? 


s 


0.98 


0.97 


0.97 


0.97 


0.96 







Marker-order correlations between composite map and component map linkage groups; the GU-SA component map is not shown as this map was used as the 
seed-map {i.e. provided a fixed-order and all correlations were 1.0). For the GU-Emb component map, superscript letters indicates whether the framework (F) or 
comprehensive (C) linkage group was used in map construction. Component map linkage groups initially excluded from composite map construction are 
indicated by ex superscript. An asterisk following the correlation value indicates that marker-order information from the component map was incorporated during 
construction of the composite map linkage group. Apart from two correlations (indicated by ns superscript) all correlations were significant at a < 0.05. Averages: 
Calculated using all six component maps, Calculated using only those linkage groups included in composite map construction (marked with an asterisk), 
Calculated using all 1 1 linkage groups, Calculated using only those linkage groups included in composite map construction (marked with an asterisk). 
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10 




1^ 



Figure 1 Marker colinearity between the GLOB-LH component map (left) and composite map (right). Lines between each homologous 
linkage group pair indicate the positions of common markers. The scale bar is in Kosambi's centiMorgans. 



composite map and those component maps included 
in map construction ranged from 0.96 to 0.99 (Aver- 
age 13 column; Table 3). 

DArT marker duplications 

Although not a main focus of this study, evidence for 
the occurrence of duplicated DArT marker loci within 
the assembled E. grandis genome sequence [36] was 
investigated for the five multicopy markers which had 
been mapped to three or more linkage groups. Two of 
these markers (ePt-637610 and ePt-637861; see Add- 
itional file 1) mapped to the same map position on each 
of four linkage groups (LGs 2, 3, 5 and 8) and were 
found to be redundant markers {i.e. identical sequences) 
based on their marker sequence similarity (bl2seq: 583/ 



606 base-pair similarity, e-value: 0.0). For the four 
unique multicopy markers, three were detected to have 
loci duplications within the E. grandis genome sequence. 
In each case, the positions of duplicated loci detected in 
the E. grandis genome sequence corresponded to the 
linkage groups to which the marker was mapped. 

Discussion 

Composite map construction 

Data from seven component maps were integrated into 
a single composite map which represents the highest 
density map yet produced in Eucalyptus. A major advan- 
tage of the marker-merger method used in this study 
was the substantial time and labour savings made when 
compared to the effort required to produce comparable 
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maps using traditional, segregation-based methods. For 
example, Li et al. [40] constructed a 2111 marker com- 
posite map from four barley mapping pedigrees and 
reported that it took 'several thousand hours' of comput- 
ing time. In a larger barley study, Wenzl et al. [41] pro- 
duced a 2935 loci composite map from ten mapping 
populations using JoinMap 3.0 [51] in combination with 
specially built Perl scripts and reported that the project 
required several months of semi-manual data processing 
[41]. In contrast, the composite map produced in this 
study was built in a single day. 

Utility of the composite map 

As sequences are available for the majority of DArT mar- 
kers on the map (91%; data not shown), the composite 
map provides a direct link to the E. grandis genome se- 
quence [36]. We have made use of this link to search the 
E. grandis genome sequence for candidate genes asso- 
ciated with QTL locations and to facilitate the placement 
of candidate genes in the component linkage maps with- 
out the need for time consuming marker development 
and genotyping. Sequence-based linkage maps have also 
provided useful tools to aid in the assembly of genome 
sequences [e.g. 52,53] and can be particularly beneficial in 
taxa (such as eucalypts) which have a relatively small gen- 
ome size. For example, during the assembly of the 
E. grandis genome sequence, a DArT linkage map was 
valuable in guiding contigs into the 11 main pseudochro- 
mosomes [16]. However, not all contigs could be aligned 
and approximately 12% of the 693 Mbp E. grandis genome 
sequence remains unassembled in more than 4900 small 
unlinked scaffolds [54]. With the composite map contain- 
ing many more DArT markers (1600+) than the linkage 
map used to aid genome assembly, the composite map 
markers may provide further positional information and 
help to anchor some of the unlinked scaffolds and refine 
the current E. grandis genome sequence. 

Over half (53%) of the markers placed in the composite 
map originated from a single component map {i.e. were 
not shared among multiple component maps). Therefore, 
the ability to determine the relative positions of markers 
mapped in different maps has been greatly enhanced 
through the integration of this data into a single map. This 
has already proven advantageous to our research group, 
with the composite map being used to quickly identify the 
linkage relationships of microsatellite markers used in 
population genetic studies. Although now a relatively sim- 
ple task, it was previously necessary to consult multiple 
linkage maps and assess their colinearity to obtain this 
same information. Furthermore, any marker developed in 
eucalypts which has known sequence, can now potentially 
be found in the eucalypt genome sequence and then 
aligned against the reference map in order to estimate its 
distance to other markers in units of recombination (cM); 



which are evolutionary meaningful units compared to base 
pair distances. Additionally, it is also important to under- 
stand the relationship between physical map (i.e. genome) 
and genetic map distances as this can have implications for 
map-based cloning efforts and/or marker-assisted selection. 
For example, uneven recombination rates across a genome 
[12,55] may result in physically distant markers appearing 
to be genetically close to each other, or vice versa. In euca- 
lypts, Kullan et al. [33] recently compared 153 linkage map 
intervals of approximately 1 cM against contigs of the 
E. grandis genome and found that the genetic map to phys- 
ical distance relationship varied considerably; ranging from 
100 kb to 2.4 Mbp per 1 cM. Therefore, the composite 
map will be useful to provide further insight into the rela- 
tionship between physical and genetic map distance in 
addition to identifying hot (or cold) spots of recombination. 

A key use of the composite map will be for compari- 
son of QTL and candidate gene positions detected 
across variable genetic backgrounds and/or environ- 
ments in different studies. This has previously been lim- 
ited due to a lack of common markers being shared 
between maps [23]. For example, Thumma et al. [27] 
detected multiple co-locating growth-related QTL on 
LG5 in E. nitens but could not accurately compare the 
position of this QTL to similar growth-related trait QTL 
detected on this same linkage group in two other studies 
[24,56]. Although most of the markers contained on the 
composite map are DArT markers, which to date have 
only been mapped in the pedigrees included in this 
study, the map does contain several hundred microsatel- 
lite markers (213) which will enable synteny and co- 
linearity to be established with many earlier linkage 
maps used for QTL detection; e.g. 13 out of 22 earlier 
studies have mapped a variable number of microsatel- 
lites [16]. This will enable QTL to be aligned against the 
composite map which may provide deeper insight into 
the genetic control of phenotypic traits in the genus. For 
example, following the construction of an integrated 
map for melon (Cucumis melo) which used data from 
eight independent mapping experiments, it was possible 
to align 370 QTL detected for 62 traits from 18 experi- 
ments [57]. Through this alignment, QTL detected in 
different studies for economically important traits were 
found to co-locate [57]; providing supporting evidence 
to substantiate the biological basis of the observed 
marker- trait association [7,8]. 

As in all linkage mapping studies, it is important to con- 
sider both the quality of the map produced and any spe- 
cific map characteristics. In the alignment of 6480 DArT 
marker sequences against the E. grandis genome sequence 
[36], Petroli et al. [11] reported that although the majority 
of markers (4189) occurred at a single genome position 
with high support, many marker sequences (2291), albeit 
at lower confidence, also exhibited similarity to a second 
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genome position and that about half of these genome 
regions contained repeat elements. Furthermore, prelim- 
inary analysis of the E. grandis genome sequence suggests 
that (as has been observed in some Rosid genomes) a 
whole-genome duplication event has occurred in the 
lineage {Myrtales) subsequent to the ancient hexaploidy 
event shared by all rosids (Myburg et al., unpublished). 
Such whole-genome, as well as, segmental duplication 
events will affect thousands of marker loci, but most 
would be expected to diverge in sequence with evolution- 
ary time yielding mostly unique marker loci. Thus, the 
presence of multicopy markers (representing putatively 
duplicated loci) in the composite map was not unex- 
pected. It is worth noting that in the construction of each 
component map, only those markers which segregated as 
a single Mendelian locus were mapped. Therefore, in the 
event of a marker duplication being present within a pedi- 
gree, only one locus could be polymorphic in order for 
that marker to produce a single loci segregation ratios. 
Consequently, it is likely that only a subset of the dupli- 
cated loci present within the eucalypt genome have been 
identified in the composite map. Given that the Pstl 
enzyme used in the complexity reduction step of DArT 
marker development [35] preferentially produces markers 
located in hypomethylated, gene rich regions [55], and 
that many DArT markers contain protein coding 
sequences [33], it is possible that some of the multicopy 
markers identified may be associated with different gene 
family members and/or be part of larger duplicated 
regions. Further studies are required to examine the full 
extent and evolution of the duplicated loci. We also 
expected some marker redundancy (markers with the 
same sequence) among the 3808 composite map DArT 
markers; an issue which arises due to the process by which 
DArT markers are generated, resulting in the same ampli- 
fied genomic fragment being represented more than once 
on the genotyping array [31,35]. Therefore, identical 
clones (e.g. the same DArT fragment, but with different 
DArT marker names) are expected to produce identical 
genotype scores and should map to identical (or near 
identical) map positions; as found for the markers ePt- 
637610 and ePt-637861 identified as identical clones in 
this study. 

The marker-merging method used in this study took ad- 
vantage of the fact that individual component maps were 
constructed using high marker-ordering stringency which 
resulted in linkage maps having robust marker-orders 
[32]. The comparison of the composite map marker-order 
against individual component maps gives an indication of 
the quality of the composite map. Marker-order correla- 
tions were mostly excellent with high pair-wise linkage 
group marker-order correlations found in most compari- 
sons. For example, in 48 out of 66 pair-wise comparisons 
the marker-order correlation exceeded 0.95. Despite these 



high correlations, most component maps did exhibit some 
marker-order inconsistencies with the composite map. A 
number of (mostly) single marker-order inconsistencies 
did occur over large distances, but most marker-order dis- 
agreements occurred among tightly grouped markers in 
regions of less than 5 cM. Although it is possible that 
some of these marker-order differences could be real and 
represent local chromosomal rearrangements or marker 
duplications between the different mapping pedigrees 
and/or species, they are more likely to reflect marker- 
order inaccuracies within any of the component maps or 
simply be artefacts of the statistical uncertainty associated 
with ordering tightly linked markers [see 58]. While users 
of this map should be aware of these limitations and how 
they may affect marker ordering, overall, the generally 
high marker-order correlations observed and the exclu- 
sion of component map linkage groups having poor mar- 
ker colinearity from initial composite map construction 
(and thus not adversely affecting composite map marker- 
order) suggests that the composite map is of a sufficiently 
high quality to facilitate the transfer of genetic information 
between studies. 

The composite map will be most useful for studies in- 
volving species from subgenus Symphyomyrtus sections 
Latoangulatae and Maidenaria; due to the composite 
map being built from linkage maps constructed in spe- 
cies from these sections. However, due to the high level 
of genome synteny and colinearity detected between 
species from these relatively distant sections [28,32,34], 
information from the composite map should also be ap- 
plicable to many other commercially important eucalypt 
species in closely related sections (e.g. E. camaldulensis 
from subgenus Symphyomyrtus section Exsertaria). 

Future marker integration 

A number of recent studies have focussed on the develop- 
ment of molecular markers for use in eucalypts. In 
addition to the DArT genotyping array developed for use 
in eucalypts [35], the feasibility of high-throughput SNP 
genotyping has been explored [59] and several tens of 
species-transferrable EST-based SSR markers have been 
recently reported [60,61]. Furthermore, DArT genotyping 
by sequencing (GBS), which combines the complexity 
reduction method of DArT [31] with next generation se- 
quencing (NGS), and which can potentially deliver up to 
three-fold as many markers as conventional DArT geno- 
typing [see 62] is becoming a cost-competitive genotyping 
option due to the recent plummeting costs of NGS se- 
quencing. Therefore, to broaden the use of the composite 
map for comparative analyses and to optimise its' worth, it 
will be necessary to add new markers to the current ver- 
sion of the composite map in the future. Although beyond 
the scope of this study, it would also be valuable to com- 
pare the marker order of the composite map to maps built 
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using the same data with other marker-merging software 
(e.g. BioMercator [63], CarthaGene [64] or MergeMap 
[65]). The R scripts and map marker positions of the com- 
ponent maps used in this study can be made available 
upon request. 

Conclusion 

The integration of markers from seven individual genetic 
linkage pedigrees has resulted in a composite, reference 
map for eucalypts with 4101 DArT and microsatellite 
markers. Although some small marker-order inconsist- 
encies exist between component maps and the compos- 
ite map, there is a relatively high agreement of marker- 
order between component maps; which indicates that 
the composite map represents a good estimation of the 
true marker positions in most cases. However, at finer 
scales (sub-cM) marker-orders may differ between com- 
ponent and composite maps due to limited statistical 
power to order such tightly linked markers. Overall, the 
genome coverage and marker density of the composite 
map greatly exceeded that achieved in any of the single 
mapping populations. It is expected that this composite 
map will provide a valuable reference map for the 
world-wide Eucalyptus research community, facilitate 
the transfer of genetic information between different 
studies and allow for the integration of DArT marker in- 
formation with other genomic resources. 

Additional files 



Additional file 1: Eucalyptus composite map details. Details of 
markers mapped in the Eucalyptus composite map. Includes, linkage 
group and position of mapped markers, marker type and which 
component map(s) markers were mapped. A T in the 'Multicopy marker' 
column indicates that the marker occurs on two or more linkage groups. 

Additional file 2: Composite map - component map marker 
colinearity. Marker colinearity among all six component maps and the 
Eucalyptus composite map. For each linkage group, three linkage group 
'triplets' show marker colinearity between two component maps 
(outside) and the composite map (centre). Horizontal lines on linkage 
group bars indicate marker positions and lines between linkage groups 
indicate the position of common markers. The scale bar shown is in 
Kosambi's centiMorgans. Component map names (abbreviations; see 
Table 1) are given above each linkage group. Linkage groups excluded 
from composite map construction are indicated in parentheses following 
the component map name. An asterisk indicates whether marker-order 
information from the component map was incorporated during 
composite map construction (see Methods). For the GU-Emb component 
map, superscript letters indicates whether the framework (f) or 
comprehensive (c) linkage group from this pedigree was used in 
composite map construction. 
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